Skip to content

Conversation

@lutter
Copy link
Collaborator

@lutter lutter commented Jan 16, 2026

Replaces std::sync::RwLock with parking_lot::RwLock for pool metrics

Use parking_lot::RwLock instead of std::sync::RwLock for connection pool metric recording. parking_lot::RwLock is faster for short-held locks as it uses efficient spinning before parking, reducing tokio worker thread blocking during metric recording.

This change helps reduce tokio threadpool contention when the connection pool is under heavy load, as the metric recording locks are held for only microseconds.

lutter and others added 8 commits January 16, 2026 12:33
…pool metrics

Use parking_lot::RwLock instead of std::sync::RwLock for connection pool
metric recording. parking_lot::RwLock is faster for short-held locks as
it uses efficient spinning before parking, reducing tokio worker thread
blocking during metric recording.

This change helps reduce tokio threadpool contention when the connection
pool is under heavy load, as the metric recording locks are held for
only microseconds.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
These locks are accessed on every GraphQL query, so using the faster
parking_lot::RwLock reduces lock contention in the query path.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ents

Replace std::sync::RwLock with parking_lot::RwLock in the
SubscriptionManager to reduce lock contention. parking_lot's RwLock
is faster for short-held locks due to efficient spinning before
parking, which helps reduce tokio threadpool contention.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace std::sync::RwLock with parking_lot::RwLock in the background
writer's Request::Write batch handling. This reduces lock contention
as parking_lot's RwLock is faster for short-held locks due to
efficient spinning before parking.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace std::sync::RwLock with parking_lot::RwLock in TimedCache for
faster lock acquisition on cache gets and sets. parking_lot's RwLock
uses efficient spinning before parking, reducing contention.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace std::sync::RwLock with parking_lot::RwLock for the chain
stores map in BlockStore. This reduces lock contention when looking
up or modifying chain stores.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…egistry

Replace std::sync::RwLock with parking_lot::RwLock for the global
metrics caches in MetricsRegistry. This reduces lock contention when
registering or looking up global metrics.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…eepAlive

Replace std::sync::RwLock with parking_lot::RwLock for the alive_map
in SubgraphKeepAlive. This reduces lock contention when tracking
running subgraph deployments.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@lutter lutter force-pushed the lutter/block branch 2 times, most recently from ab17afd to a6245db Compare January 16, 2026 22:38
With many subgraphs, chain_head_ptr() was querying the database on every
call, leading to connection pool saturation. This adds an adaptive cache
that learns optimal TTL from observed block frequency.

The cache uses EWMA to estimate block time and sets TTL to 1/4 of that
estimate (bounded by 20ms-2000ms). During warmup (first 5 blocks), it
uses the minimum TTL to avoid missing blocks on unknown chains.

New metrics:
- chain_head_ptr_cache_hits: cache hit counter
- chain_head_ptr_cache_misses: cache miss counter (DB queries)
- chain_head_ptr_cache_block_time_ms: estimated block time per chain

Safety escape hatch: set GRAPH_STORE_DISABLE_CHAIN_HEAD_PTR_CACHE=true
to revert to the previous uncached behavior.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
lutter and others added 2 commits January 17, 2026 15:47
Replace RwLock<MovingStats> with a lock-free AtomicMovingStats that uses
an atomic ring buffer with packed bins. Each bin packs epoch (32 bits),
count (32 bits), and duration_nanos (64 bits) into a single AtomicU128
for lock-free CAS updates.

This eliminates lock contention when many threads write concurrently
(every semaphore wait, connection checkout, query execution) while
reducing memory usage by 2x (4.8KB vs 9.6KB per stats instance).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The ChainHeadPtrCache introduced in 7ecdbda can cause connection pool
exhaustion when the cache expires: multiple concurrent callers each
acquire a database connection, then block waiting for a write lock to
update the cache - while still holding their connections.

This adds a HerdCache layer that ensures only one caller queries the
database when the TTL cache expires. Other concurrent callers await
the in-flight query result instead of each acquiring their own
connection.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…cks for active connections

Connections that were used within the last 30 seconds (configurable via
GRAPH_STORE_CONNECTION_VALIDATION_IDLE_SECS) now skip the SELECT 67 health
check during pool recycle. This reduces connection checkout latency from
~4ms to ~0ms for frequently-used connections while still validating idle
connections to detect stale database connections.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants